Frank Sun final project

First, load all data in object

source("process_teams.R")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
nba_data <- all_teams_data()
nba_data_1 <- nba_data %>% 
  mutate_at(vars(-name), as.numeric)

The data has already been processed, each row is a player, with a metric (my ESPN league’s fantasy score) calculated and accumulated over a season.

Here’s a visual:

source("plot.R")

plot <- plot_nba(nba_data_1)
plot

As you can see, most player are concentrated towards the bottom. More work can be done to uncover interesting trends. To start with, given what is already plotted, it can be straightforward to highlight the top n performers, in this case 20.

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
pivoted_nba <- pivot_nba(nba_data_1) %>% 
  arrange(desc(cumscore)) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  head(20) %>% 
  print()
## # A tibble: 20 Ă— 4
## # Groups:   name [20]
##    name      date       score cumscore
##    <chr>     <date>     <dbl>    <dbl>
##  1 jokicni01 2023-04-08  22      2689 
##  2 sabondo01 2023-04-09  25      2508 
##  3 embiijo01 2023-04-06  24      2458 
##  4 doncilu01 2023-04-07  12      2276.
##  5 tatumja01 2023-04-07  18      2274.
##  6 gilgesh01 2023-04-06  21.5    2162.
##  7 antetgi01 2023-04-04  31.5    2124.
##  8 vucevni01 2023-04-09  14.5    2074.
##  9 randlju01 2023-03-29   2      2005 
## 10 davisan02 2023-04-09  30.5    1922.
## 11 adebaba01 2023-04-09   8.5    1904.
## 12 siakapa01 2023-04-07  22      1884.
## 13 derozde01 2023-04-09  16.5    1848.
## 14 youngtr01 2023-04-07  38      1840.
## 15 edwaran01 2023-04-09  28      1832.
## 16 mobleev01 2023-04-09   6      1814.
## 17 foxde01   2023-04-09  13      1811 
## 18 claxtni01 2023-04-07  31.5    1806 
## 19 butleji01 2023-04-06  29      1795 
## 20 lillada01 2023-03-22  34      1785
plot2 <- pivot_nba(nba_data) %>%
    filter(name %in% pivoted_nba$name) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 20 scorers", x = "Date", y = "Cumulative Score")

ggplotly(plot2, tooltip = c("y", "group"))

It is now clear that the highest line we saw previously belonged to Nikola Jokic. Many of the NBA’s top performers are here, although context is needed to determine what their names actually are. To match the “usernames” to the real names is another layer of work needed, back in the collection process. Names that might be interesting here are the second to the top purple line (Domantas Sabonis), Nic Claxton who came somewhat out of nowhere in 22-23 season.

games_played_data <- all_games_player()
nba_data_3 <- pivot_nba(nba_data) %>% 
  filter(date == "2023-04-09" | date == "2023-02-16") %>% 
  group_by(name) %>% 
  summarise(total_score = max(cumscore), post_asg_score = max(cumscore) - min(cumscore))

nba_data_4 <- full_join(nba_data_3, games_played_data) %>% 
  mutate(avg_score = total_score / games_played, post_asg_avg = post_asg_score / after_all_star)
## Joining with `by = join_by(name)`
top_pg_scorers <- nba_data_4 %>% 
  arrange(desc(avg_score)) %>% 
  head(20) %>% 
  select(name, avg_score)
top_pg_scorers
## # A tibble: 20 Ă— 2
##    name      avg_score
##    <chr>         <dbl>
##  1 jokicni01      39.0
##  2 embiijo01      37.2
##  3 doncilu01      34.5
##  4 davisan02      34.3
##  5 antetgi01      33.7
##  6 duranke01      32.0
##  7 gilgesh01      31.8
##  8 sabondo01      31.7
##  9 lillada01      30.8
## 10 tatumja01      30.7
## 11 jamesle01      30.6
## 12 curryst01      29.9
## 13 hardeja01      28.1
## 14 butleji01      28.0
## 15 willizi01      27.8
## 16 irvinky01      27.7
## 17 halibty01      27.3
## 18 porzikr01      27.0
## 19 leonaka01      26.6
## 20 markkla01      26.6
plot3 <- pivot_nba(nba_data) %>%
    filter(name %in% top_pg_scorers$name) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 20 per-game scorers", x = "Date", y = "Cumulative Score")

ggplotly(plot3, tooltip = c("y", "group"))

Stand out names that appear here that weren’t in the previous graph include most obviously Zion Williamson, as well as names like Lillard, Leonard, and Durant. This lines up with the most common reason players are very talented and perform well, but don’t accrue total stats and burn draftees (long term injuries).

top_pg_scorers_asg <- nba_data_4 %>% 
  arrange(desc(post_asg_score)) %>% 
  head(20) %>% 
  select(name, post_asg_score)
top_pg_scorers_asg
## # A tibble: 20 Ă— 2
##    name      post_asg_score
##    <chr>              <dbl>
##  1 embiijo01           810 
##  2 sabondo01           792.
##  3 davisan02           715 
##  4 jokicni01           694.
##  5 butleji01           632 
##  6 ingrabr01           582.
##  7 tatumja01           570.
##  8 jacksja02           562.
##  9 bookede01           559 
## 10 siakapa01           555 
## 11 lavinza01           552 
## 12 antetgi01           551 
## 13 bridgmi01           548.
## 14 claxtni01           542.
## 15 vucevni01           541 
## 16 giddejo01           538 
## 17 leonaka01           534.
## 18 foxde01             534.
## 19 lopezbr01           526 
## 20 youngtr01           522.
plot4 <- pivot_nba(nba_data) %>%
    filter(name %in% top_pg_scorers_asg$name) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 20 scorers after the all star break", x = "Date", y = "Cumulative Score")

ggplotly(plot4, tooltip = c("y", "group"))

We sometimes have a notion of “playoff winners”, here we look at players who saved the best for last. Jokic still dominates, but new names include Mikal Bridges, whose mid-season trade to the Nets unlocked a new facet to his game, and Brandon Ingram, who shook off extensive injuries early in the season to finish strongly after the all star break.

top_risers <- nba_data_4 %>% 
  arrange(desc(post_asg_avg - avg_score)) %>% 
  head(20) %>% 
  select(name, avg_score, post_asg_avg)
top_risers
## # A tibble: 20 Ă— 3
##    name      avg_score post_asg_avg
##    <chr>         <dbl>        <dbl>
##  1 hasleud01      3.07        19.5 
##  2 isaacjo01      8.45        18.5 
##  3 maledth01      9.58        16.8 
##  4 halibty01     27.3         34.2 
##  5 lawsoaj01      3.4         10.2 
##  6 kesslwa01     19.1         25.7 
##  7 pritcpa01      5.34        11.7 
##  8 hortota01     11.3         17.6 
##  9 theisda01      8.43        14.5 
## 10 mamuksa01      8.44        14.4 
## 11 colliza01     16.1         21.8 
## 12 mcgruro01      6.42        12.1 
## 13 willija06     17.5         23.1 
## 14 tillmxa01     12.5         18.1 
## 15 reaveau01     14.9         20.4 
## 16 azubuud01      6.74        12.2 
## 17 quickim01     16.1         21.4 
## 18 nworajo01      9.37        14.7 
## 19 whiteja03      2.09         7.33
## 20 sochaje01     12.8         17.9
plot5 <- pivot_nba(nba_data) %>%
    filter(name %in% top_risers$name) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 20 most improved after the all star break", x = "Date", y = "Cumulative Score")

ggplotly(plot5, tooltip = c("y", "group"))

Players who made improvements after the all star break include players who went from 0 to something simply because their team started to tank towards the end of the season. More interesting are names like Kessler, Jalen Williams, and Zach Collins, who made big strides, due to general improvement or new team situation.

nba_data_5 <- all_teams_data(TRUE) # Modified to represent categorical
plot_nba(nba_data_5 %>% mutate_at(vars(-name), as.numeric))

A (somewhat crude) attempt to represent categorical value. No prizes for guessing who comes in first.

nba_data_6 <- nba_data_5 %>% 
  mutate_at(vars(-name), as.numeric)

pivoted_nba_1 <- pivot_nba(nba_data_6) %>% 
  arrange(desc(cumscore)) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  head(20) %>% 
  print()
## # A tibble: 20 Ă— 4
## # Groups:   name [20]
##    name      date        score cumscore
##    <chr>     <date>      <dbl>    <dbl>
##  1 jokicni01 2023-04-08  1.29      517.
##  2 embiijo01 2023-04-04 21.1       491.
##  3 gilgesh01 2023-04-04  3.63      481.
##  4 davisan02 2023-04-09 12.3       369.
##  5 butleji01 2023-04-06  3.03      336.
##  6 tatumja01 2023-04-04  0.174     323.
##  7 irvinky01 2023-04-05  9.39      316.
##  8 lillada01 2023-03-22  1.99      290.
##  9 duranke01 2023-04-02 10.4       289.
## 10 curryst01 2023-04-09  5.97      284.
## 11 doncilu01 2023-04-01 16.2       269.
## 12 porzikr01 2023-03-28 16.3       266.
## 13 jacksja02 2023-04-07 13.8       262.
## 14 halibty01 2023-03-25  7.13      257.
## 15 leonaka01 2023-04-08  4.45      252.
## 16 vanvlfr01 2023-04-04 13.1       252.
## 17 mitchdo01 2023-04-04 10.1       246.
## 18 hardeja01 2023-04-04  9.28      217.
## 19 sabondo01 2023-03-27  3.53      205.
## 20 bridgmi01 2023-04-02  4.43      188.
plot6 <- pivot_nba(nba_data_6) %>%
    filter(name %in% pivoted_nba_1$name) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 20 category performers", x = "Date", y = "Cumulative Score")

ggplotly(plot6, tooltip = c("y", "group"))

Names that jump into the top 20 that couldn’t before include Irving, Curry, VanVleet, and Lillard (who previously missed too many games). This tracks with what we know about the difference in these scoring systems.

pivoted_nba_2 <- pivot_nba(nba_data_6) %>% 
  arrange(cumscore) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  head(20) %>% 
  print()
## # A tibble: 20 Ă— 4
## # Groups:   name [20]
##    name      date         score cumscore
##    <chr>     <date>       <dbl>    <dbl>
##  1 iveyja01  2023-04-05  -5.80     -376.
##  2 barrerj01 2023-04-09  -9.00     -368.
##  3 mathube01 2023-04-07  -5.89     -320.
##  4 banchpa01 2023-04-04  -5.96     -296.
##  5 greenja05 2023-04-09  -2.57     -285.
##  6 greenje02 2023-04-08 -10.6      -267.
##  7 brissos01 2023-04-09 -11.2      -248.
##  8 landajo01 2023-04-07  -6.80     -231.
##  9 martike04 2023-04-09  -7.14     -231.
## 10 osmande01 2023-04-09  -9.00     -229.
## 11 westbru01 2023-03-27  -1.40     -226.
## 12 poolejo01 2023-04-09  -0.415    -220.
## 13 monkma01  2023-03-27 -10.6      -219.
## 14 poweldw01 2023-04-02  -8.73     -218.
## 15 grahade01 2023-04-04  -4.72     -218.
## 16 marshna01 2023-04-09 -10.2      -212.
## 17 clarkjo01 2023-03-05  -0.756    -212.
## 18 kuminjo01 2023-04-07  -3.97     -212.
## 19 huntede01 2023-04-09  -5.77     -210.
## 20 drumman01 2023-04-09  -6.85     -209.
plot7 <- pivot_nba(nba_data_6) %>%
    filter(name %in% pivoted_nba_2$name) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Bottom 20 category performers", x = "Date", y = "Cumulative Score")

ggplotly(plot7, tooltip = c("y", "group"))

The bottom category performers are more interesting than bottom points performers because these players actually play. These names are familiar to fans, possibly for reasons that aren’t so nice.

search_and_plot <- function(list) {
  plot3 <- pivot_nba(nba_data) %>%
    filter(name %in% list) %>% 
    ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
    geom_line() +
    theme_classic() +
    theme(legend.position = "none") +
    labs(title = "Searched Players", x = "Date", y = "Cumulative Score")

  ggplotly(plot3, tooltip = c("y", "group"))
}

search_and_plot(c("youngtr01", "willizi01", "bridgmi01", "poolejo01", "willija06"))

The last component is for the user to input the names themselves, which unfortunately requires them to know what their bball ref name is. Luckily, it follows a straightforward formula: first five letters of last name, plus first two letters of first name, plus identifying number. If your letter combination is unique in nba history, that number is probably 01.